Types of number:
Binary is base 2. There are two digits: 0 and 1. One binary digit is a bit, and eight digits are a byte.
To convert a number from decimal to binary. repeatedly write down the remainder when divided by two, then integer divide by two. The first number corresponds to the least significant bit.
Unsigned binary numbers are represented with normal place values:
Unsigned binary numbers with a fractional part can be represented in fixed point binary by having place values of
Signed binary numbers are typically represented with two's complement, where the leftmost place value has a negative value. This means an
To convert a positive number to it's two's complement negative equivalent, invert every bit to the left of the least significant 1.
Hexadecimal is base 16, where decimal numbers 10-15 are represented by digits A-F. Each hex digit corresponds to 4 binary digits, which makes hex easier to remember or type for humans.
ASCII (American Standard Code for information Interchange) uses 7 bits to represent all the characters on a standard English-language keyboard, including all uppercase and lowercase Latin characters, numbers, and some symbols. The first 32 codes are non-printable control characters.
Numbers in ASCII are not represented as their actual binary number value - the digit '3' is represented by a binary character code equivalent to 51 in decimal.
Unicode was created to allow more alphabets for different languages to be represented, including Greek, Arabic, and Cyrillic. UTF-16 and UTF-32 use 16 and 32 bits respectively per character[1]. UTF-8 is variable width, using between 8 and 32 bits per character. It is the most commonly used Unicode encoding.
A parity bit uses an additional bit that is set at transmission and later compared to identify if there were errors in transmission. Under odd parity, the parity bit is set so the total number of 1s per byte is odd. Similarly, under even parity, the parity bit is set so the total number of 1s per byte is even.
In majority voting, each bit is sent a minimum of three times. If a bit is flipped, the recipient assumes that the most commonly received bit is correct.
A checksum is a number that is calculated using an algorithm, and transmitted along with a block of data. The recipient then re-calculates the checksum based on what they receive, and compares against the received checksum. If they do not match, an error has occurred.
Check digits are a digit at the end of a string of numbers, and are typically used to check for errors with human inputted data e.g. ISBN numbers. They can be calculated by applying multiplicative weights, adding, and moduloing.
An ADC (analogue to digital converter) is used with analogue sensors to convert measurements into discrete binary signals.
A DAC (digital to analogue converter) converts a digital signal to an analogue output. They are commonly used to covert a digital analogue signal into an analogue signal.
A bitmap image is represented by an array of pixels, which are each represented by a binary value that represents the colour of that pixel.
Properties of bitmap image files include:
Vector images are represented by a list of geometric objects. Properties of each object are stored, such as position, size, fill colour, line colour, and line weight. Vector images can have smaller file sizes, but are only suitable for graphics such as logos, not real-life photographs. Individual objects can be easily edited in vector graphics.
Sound is represented digitally by a series of discrete samples, with discrete binary value amplitudes. A analogue to digital converter converts an analogue microphone signal to a binary signal, which is sampled at regular intervals, and each sample is quantised to a discrete binary value and stored.
Properties of sound files include:
Nyquist's Theorem states that in order to produce an accurate recording, the sampling rate must be at least double the highest frequency.
MIDI is a protocol for electronic musical instruments and computers to communicate. A MIDI controller carries event messages that specify notes and contain details such as the pitch, duration, timbre, voice, and velocity of a note. A MIDI file contains a list of MIDI event messages which are used to synthesise the sound. Individual notes can be easily individually manipulated, and it is easy to perform manipulations such as playing in a different key.
Compression reduces the size (in bytes) of a file. This reduces the storage space needed, reduces bandwidth used when transmitting a file, and reduces file transfer times.
Lossy compression works by removing non-essential information. Some information is permanently lost. For example, MP3 files use lossy compression that remove sounds too high to hear, and quieter sounds that are played at the same time as louder sounds.
Lossless compression works by recording patterns in data rather than the actual data. All the original information can be exactly recovered. Lossless compression is suitable for text, but results in much larger file sizes than lossy compression for things such as images.
Run length encoding involves storing the length and value of 'runs' of the same consecutive value.
Dictionary-based methods store a dictionary and a series of keys. Each key corresponds to a value in the dictionary, which can then be used to reassemble the original message.
Encryption is the transformation of data from a one form (the plaintext) to another (the ciphertext) such that an unauthorised third party can't understand it. The encryption algorithm is the cipher, which requires a secret key to decrypt the message.
The Caesar Cipher is a substitution cipher where each letter of the alphabet is shifted by a given number of letters (the key). The Caesar Cipher is insecure as there are only 25 possible keys, making it easy to brute-force. Additionally, each character always maps to a specific output character, making cryptoanalysis techniques such as frequency analysis possible.
The Vernam Cipher requires a one-time pad equal in or longer than the message being sent. The one-time pad must be truly random and only ever used once. Each plaintext character is XORed with the one-time pad character. The random key means the distribution of characters is also random, making the Vernam Cipher immune to cryptoanalysis.
While the Vernam Cipher is theoretically unbreakable, Other ciphers are computationally secure. In theory, with enough ciphertext and time, every other cryptographic algorithm can be broken.
Footnote on UTF-16
UTF-16 is actually a variable-width encoding that uses 8 or 16 bits per character.↩︎